feat(parameters): pluggable secret/param storage with 4 providers#190
feat(parameters): pluggable secret/param storage with 4 providers#190fedemaleh wants to merge 41 commits into
Conversation
| mkdir -p parameters/tests/providers/<provider_name> | ||
| ``` | ||
|
|
||
| `<provider_name>` is `snake_case` and is what users will set in `SECRET_PROVIDER` / `PARAMETER_PROVIDER`. |
There was a problem hiding this comment.
Esto en realidad tiene coincidir con el slug del provider de tipo parameters-storage.
|
|
||
| --- | ||
|
|
||
| ## Step 4: Write `fetch_configuration` (optional) |
There was a problem hiding this comment.
Eso creo que dijimos de sacarlo
| 1. Set the env var: `SECRET_PROVIDER=<provider_name>` and/or `PARAMETER_PROVIDER=<provider_name>`. | ||
| 2. If using `fetch_configuration`, the platform team needs to ensure the fetch mechanism (np CLI, REST endpoint, etc.) returns the JSON shape your provider expects. | ||
|
|
||
| Done. The new provider is reachable from every workflow without any other change. |
There was a problem hiding this comment.
Esto no va más, vamos a sacar esta información del campo provider que viene en el payload.
|
|
||
| A monolithic scope tied to one backend forces fork-and-modify for every variation. This package inverts the relationship: the **dispatch layer is the package**, the **backends are pluggable modules** dropped into `providers/`. | ||
|
|
||
| The platform decides which provider handles each parameter — there is no per-environment / per-agent configuration of "which provider to use". The notification payload carries that information directly. |
There was a problem hiding this comment.
Esto no es cierto. La plataforma decide pero hay configuraciones per-environment, hay que configurarlas vía el provider de tipo parameters-storage
|
|
||
| The provider's configuration travels in the same payload at `provider.attributes`. `build_context` exports it as `PROVIDER_CONFIG` (a JSON string). Each provider's `setup` reads from `PROVIDER_CONFIG` via `get_config_value --provider '.field'` to extract specific fields (region, kms_key_id, etc.). | ||
|
|
||
| This means **there is no per-environment configuration of "which provider"** — the platform decides per-parameter. A single agent can serve parameters routed to Vault and secrets routed to Secrets Manager at the same time, without any agent-side configuration. |
There was a problem hiding this comment.
Lo mismo que antes, se configura vía una configuración.
There was a problem hiding this comment.
Lo de que un mismo agente puede servir múltiples storage es correcto.
| | **`provider.specification_id`** | **UUID identifying which provider handles this parameter** | | ||
| | **`provider.attributes`** | **Provider-specific configuration (region, vault address, etc.)** | | ||
| | `provider.nrn` | Provider-instance NRN (informational) | | ||
| | `provider.dimensions` | Provider-instance dimensions (informational, different from parameter dimensions) | |
There was a problem hiding this comment.
Dejemos muy claro que esto no hay que usarlo
| `build_context` calls: | ||
|
|
||
| ```bash | ||
| np provider specification read --id <provider.specification_id> --output json |
| Two things that used to be design points but are obsolete now: | ||
|
|
||
| - **`SECRET_PROVIDER` / `PARAMETER_PROVIDER` env vars** — not needed. The platform sends `specification_id` per parameter, so there's no global "which provider to use" setting. | ||
| - **`fetch_configuration` scripts per provider** — not needed. Config comes in the payload as `provider.attributes`, no separate fetching step. |
There was a problem hiding this comment.
Borrar esto, esta es la primera implementación, no tiene sentido documentar decisiones intermedias que nunca vieron la luz.
|
|
||
| | Kind | Storage location | This package | | ||
| |-------------|---------------------------------------------------|--------------| | ||
| | Clear-text | nullplatform API (its own datastore) | Not involved | |
There was a problem hiding this comment.
nullplatform API no es clear-text. Creo que los kind son: nullplatform-storage, third-party-storage
| | Clear-text | nullplatform API (its own datastore) | Not involved | | ||
| | Secret | External provider (this package: AWS SM) | Used | | ||
|
|
||
| Only **secret** parameters trigger the `store` / `retrieve` / `delete` workflows. Clear-text values never leave nullplatform and never touch AWS SM. From the operator's perspective: this provider is invisible until a parameter is marked as a secret. |
There was a problem hiding this comment.
Esto es mentira, se puede configurar secret manager para parámetros no secretos, es costoso pero es algo que se puede hacer.
| ``` | ||
| parameters/<external_id> | ||
| ``` | ||
|
|
||
| - **`parameters/`** — fixed prefix. This is the IAM anchor: it lets a single resource ARN pattern (`arn:...:secret:parameters/*`) cover everything this provider manages, and nothing else. Removing the prefix would force IAM to either allow account-wide access or maintain an enumerated list of secret names (impractical, since names are generated at runtime). | ||
| - **`<external_id>`** — UUIDv4 generated by the `store` script via `uuidgen` (with `openssl rand -hex 16` as a portable fallback). This becomes the canonical handle nullplatform persists; subsequent `retrieve` / `delete` actions get it back via `NP_ACTION_CONTEXT.notification.external_id`. |
There was a problem hiding this comment.
Esto no sigue lo que hablamos, el path debería incluir los slugs / ids de entidades y el nombre del param además del uuid
There was a problem hiding this comment.
También documentar en la docu global, los paths donde se guardan las cosas deben seguir el principio rector que si alguien entra al storage layer manualmente (secret manager, vault, etc.) debe poder encontrar lo que busca, siempre tiene que ser human friendly.
| "parameter_id": 42, | ||
| "value": "the-actual-secret", | ||
| "stored_at": "2026-05-05T12:34:56Z", | ||
| "external_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479" |
There was a problem hiding this comment.
Agregaría un "managed_by":"nullplatform"
| 1. **Don't store non-secrets here.** Clear-text parameters belong in nullplatform's API. Mis-classification is the most common cost regression. | ||
| 2. **Delete promptly.** `--force-delete-without-recovery` (already used) ends billing immediately. Without it, AWS keeps charging for the 7–30 day soft-delete window. | ||
| 3. **Avoid replication unless you need DR.** Each replica is a full $0.40/month. | ||
| 4. **Cache at the consumer side.** Each `GetSecretValue` is a billable call. For high-fanout reads (e.g., autoscaling pods all pulling the same value), have a single retrieve at deploy-time and inject as env var, rather than per-pod API calls. |
There was a problem hiding this comment.
Esto quitar, es responsabilidad de la plataforma optimizar esto, no del user.
| ### Comparison with self-hosted Vault | ||
|
|
||
| This isn't apples-to-apples: | ||
|
|
||
| - **Vault**: no per-secret fee. Cost is the underlying infra (EC2/EKS, storage, ops time). Below ~250 secrets, self-hosted Vault tends to be cheaper on paper but more expensive in operator hours. | ||
| - **AWS SM**: linear in secret count, zero ops. Above ~250 secrets the costs converge; the trade-off becomes operational rather than financial. | ||
|
|
There was a problem hiding this comment.
No me interesa esta comparación. Sacar.
|
|
||
| ### Idempotency | ||
|
|
||
| - `delete` is idempotent: it suppresses errors with `|| true` and always returns `{success: true}`. Re-deleting a missing secret is a no-op. |
There was a problem hiding this comment.
Esto es una mala decisión, sólo tiene que hacer supress del error de not found, los otros errores (ej: falta de permiso de IAM deberían reportarse y marcarse como error)
|
|
||
| ### ARN shape | ||
|
|
||
| When you `CreateSecret --name parameters/<uuid>`, AWS appends a random 6-character suffix to the ARN: |
There was a problem hiding this comment.
Más que parameters/ creo que el path debería ser nullplatform/
| ## Wildcards: which ones are OK | ||
|
|
||
| There are two distinct uses of `*` in IAM, frequently conflated: | ||
|
|
||
| | Pattern | Meaning | Allowed here? | | ||
| |------------------------------------------------------|--------------------------------------|---------------| | ||
| | `"Resource": "*"` | All resources of all types in the account | **No** | | ||
| | `"Resource": "arn:...:secret:parameters/*"` | Path glob on the `parameters/` prefix | **Yes** | | ||
|
|
||
| The second is not a privilege escalation — it is the only way to express "all secrets owned by this provider" given that secret names are UUIDs generated at runtime and cannot be enumerated in advance. Avoiding it would force either explicit per-secret policies (impossible for unknown UUIDs) or `Resource: "*"` (much wider). |
There was a problem hiding this comment.
Sólo documentemos la necesidad que es acceder a secrets cuyo inicio es nullplatform/*
|
|
||
| --- | ||
|
|
||
| ## Splitting agent vs consumer |
There was a problem hiding this comment.
Sacar esta sección, es irrelevante para esta implemetnación
|
|
||
| For reference — these are commonly requested but **not** needed by the current scripts and should be denied unless a specific use case is documented: | ||
|
|
||
| - `secretsmanager:PutSecretValue` — would let the agent overwrite values. Not used; secrets are immutable in this design. |
There was a problem hiding this comment.
Hay algo que creo que se está confundiendo.
El modelo de datos es así:
Hay un parameter id (ej: name=DB_PASSWORD, type=secret, id=1) que tiene múltiples values (ej: env:prod value=abcd).
Los values son inmutables, cada vez que se modifica para una misma combinación de dimensiones / scope nrn se genera una nueva versión. El modelo de datos debería guardar dentro del mismo secret todo el versionado, dado que tenemos que soportar ver y restaurar versiones viejas.
…orm-side persistence)
|
|
||
| ### `retrieve` | ||
|
|
||
| Read the value, return `{value}` or `{value: "value not found"}` on miss. |
There was a problem hiding this comment.
Por que no tiras un error si value not found?
…ger + Parameter Store)
…s parameters (works for both AWS providers)
… per-instance for_each)
…ION_ACTION env var
…from CONTEXT.scope.nrn
…entrypoint wall clock
…xt.notification_parse + assume_role_step.resolve_arn markers
…~0.9s on warm calls)
…ials, drop NP_STS_CACHE_DIR/DISABLE env vars
…ullplatform/ prefix, region from IRSA
… doesn't allow '=')
…, action-aware entity fetch (single-wave prefetch)
…l provider schemas
…nfiguration.json.tpl inside
No description provided.